Corpus-based Acquisition of Collocational Prepositional Phrases

نویسندگان

  • Gosse Bouma
  • Begoña Villada
چکیده

Collocational prepositional phrases like ten koste van (at the expense of), met het oog op (with an eye on), and onder het mom van (under the pretext of) are patterns of the form P-NP-P, which have a non-compositional semantics and which are syntactically rigid or idiosyncratic. We present a number of linguistic tests which set such items apart from regularly built prepositional phrases. To find candidate strings which should be included in a computational lexicon as collocational prepositional phrases, we extract all instances of the relevant pattern from a corpus annotated with POS tags. Next, we introduce a number of statistical tests (mutual information, log-likelihood, and 2) to find those instances which behave like strong collocations. The strongest collocations according to the statistical tests are compared with lists of such items presented elsewhere, and were evaluated by human judges.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Distinction of Arguments and Modi ers: the Case of Prepositional Phrases

The automatic distinction of arguments and modiiers is a necessary step for the automatic acquisition of subcategorisation frames and argument structure. In this work, we report on supervised learning experiments to learn this distinction for the diicult case of prepositional phrases attached to the verb. We develop statistical indicators of linguistic diagnostics for argumenthood, and we appro...

متن کامل

Integration of Semantic and Syntactic Constraints for Structural Noun Phrase Disambiguation

A fundamental problem in Natural Language Processing is the integration of syntactic and semantic constraints. In this paper we describe a new approach for the integration of syntactic and semantic constraints which takes advantage of a learned memory model. Our model combines localist representations for the integration of constraints and distributed representations for learning semantic const...

متن کامل

A Corpus-based Analysis of Collocational Errors in the Iranian EFL Learners' Oral Production

Collocations are one of the areas generally considered problematic for EFL learners. Iranian learners of English like other EFL learners face various problems in producing oral collocations.  An analysis of learners' spoken interlanguage both indicates the scope of the problem and the necessity to spend more time and energy by learners on mastering collocations. The present study specifically f...

متن کامل

Automatic distinction of arguments and modifiers: the case of prepositional phrases

The automatic distinction of arguments and modiiers is a necessary step for the automatic acquisition of subcategorisation frames and argument structure. In this work, we report on supervised learning experiments to learn this distinction for the diicult case of prepositional phrases attached to the verb. We develop statistical indicators of linguistic diagnostics for argumenthood, and we appro...

متن کامل

CFN - based Semantic Role Labeling of Chinese Prepositional Phrase ⋆

Prepositional Phrases are often among the most frequent expressions in Chinese, but they have been ignored on the grounds of being syntactically promiscuous and semantically vacuous, and relegated to the ignominious rank of “stop word”. The Chinese FrameNet (CFN) is a lexical resource project developed by Shanxi University, Taiyuan, based on the principles of Frame Semantics and supported by co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001